Modelling Disagreement Between Judges for Information Retrieval System Evaluation

نویسندگان

Andrew Turpin

Falk Scholer

چکیده

The batch evaluation of information retrieval systems typically makes use of a testbed consisting of a collection of documents, a set of queries, and for each query, a set of judgements indicating which documents are relevant. This paper presents a probabilistic model for predicting IR system rankings in a batch experiment when using document relevance assessments from different judges, using the precision-at-n family of metrics. In particular, if a new judge agrees with the original judge with an agreement rate of α, then a probability distribution of the difference between the P@n scores of the two systems is derived in terms of α. We then examine how the model could be used to predict system performance based on user evaluation of two IR systems, given a previous batch assessment of the two systems together with a measure of the agreement between the users and the judges used to generate the original batch relevance judgements. From the analysis of data collected in previous user experiments, it can be seen that simple agreement (α) between users varies widely between search tasks and information needs. A practical choice of parameters for the model from the available data is therefore difficult. We conclude that gathering agreement rates from users of a live search system requires careful consideration of topic and task effects.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Retrieval–travel-time model for free-fall-flow-rack automated storage and retrieval system

Automated storage and retrieval systems (AS/RSs) are material handling systems that are frequently used in manufacturing and distribution centers. The modelling of the retrieval–travel time of an AS/RS (expected product delivery time) is practically important, because it allows us to evaluate and improve the system throughput. The free-fall-flow-rack AS/RS has emerged as a new technology for dr...

متن کامل

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

Cross-Evaluation: A new model for information system evaluation

In this article, we introduce a new information system evaluation method and report on its application to a col-laborative information seeking system, AntWorld. The key innovation of the new method is to use precisely the same group of users who work with the system as judges, a system we call Cross-Evaluation. In the new method, we also propose to assess the system at the level of task complet...

متن کامل

Measuring the Agreement Among Relevance Judges

The importance of the issue of the agreement (or disagreement) between relevance judges is increasing, since new kinds of relevance judgment expression are being used (to the classical dichotomous one, various researches have added scalar, weighted, and orders of various kind) and new media are being introduced (it is far quicker to judge the relevance of an image than a text, and thus the huma...

متن کامل

Performance Evaluation of Medical Image Retrieval Systems Based on a Systematic Review of the Current Literature

Background and Aim: Image, as a kind of information vehicle which can convey a large volume of information, is important especially in medicine field. Existence of different attributes of image features and various search algorithms in medical image retrieval systems and lack of an authority to evaluate the quality of retrieval systems, make a systematic review in medical image retrieval system...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Modelling Disagreement Between Judges for Information Retrieval System Evaluation

نویسندگان

چکیده

منابع مشابه

Retrieval–travel-time model for free-fall-flow-rack automated storage and retrieval system

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Cross-Evaluation: A new model for information system evaluation

Measuring the Agreement Among Relevance Judges

Performance Evaluation of Medical Image Retrieval Systems Based on a Systematic Review of the Current Literature

عنوان ژورنال:

اشتراک گذاری